Korean-Chinese Person Name Translation for Cross Language Information Retrieval
نویسندگان
چکیده
Named entity translation plays an important role in many applications, such as information retrieval and machine translation. In this paper, we focus on translating person names, the most common type of name entity in Korean-Chinese cross language information retrieval (KCIR). Unlike other languages, Chinese uses characters (ideographs), which makes person name translation difficult because one syllable may map to several Chinese characters. We propose an effective hybrid person name translation method to improve the performance of KCIR. First, we use Wikipedia as a translation tool based on the inter-language links between the Korean edition and the Chinese or English editions. Second, we adopt the Naver people search engine to find the query name’s Chinese or English translation. Third, we extract Korean-English transliteration pairs from Google snippets, and then search for the English-Chinese transliteration in the database of Taiwan’s Central News Agency or in Google. The performance of KCIR using our method is over five times better than that of a dictionary-based system. The mean average precision is 0.3490 and the average recall is 0.7534. The method can deal with Chinese, Japanese, Korean, as well as non-CJK person name translation from Korean to Chinese. Hence, it substantially improves the performance of KCIR.
منابع مشابه
KUNLP System for NTCIR-3 English-Korean Cross-Language Information Retrieval
This paper describes KUNLP system for the English-Korean cross-language information retrieval track in NTCIR-3 workshop and some experiments after the workshop. Query translation method based on the bilingual dictionary and the document language corpus was used. To automatically transliterate some proper nouns such as Korean person names, Korean place names, and Korean company names, we have co...
متن کاملLearning Patterns from the Web to Translate Named Entities for Cross Language Information Retrieval
Named entity (NE) translation plays an important role in many applications. In this paper, we focus on translating NEs from Korean to Chinese to improve Korean-Chinese cross-language information retrieval (KCIR). The ideographic nature of Chinese makes NE translation difficult because one syllable may map to several Chinese characters. We propose a hybrid NE translation system. First, we integr...
متن کاملProper Name Translation in Cross-Language Information Retrieval
Recently, language barrier becomes the major problem for people to search, retrieve, and understand WWW documents in different languages. This paper deals with query translation issue in cross-language information retrieval, proper names in particular. Models for name identification, name translation and name searching are presented. The recall rates and the precision rates for the identificati...
متن کاملNTCIR-4 Chinese, English, Korean Cross Language Retrieval Experiments Using PIRCS
In NTCIR-4 we participated in Korean, Chinese, English monolingual, Chinese-English, EnglishKorean bilingual, and Chinese-Korean cross language (using English as pivot) retrieval tasks based on our PIRCS retrieval system. The query translation approach was employed for CLIR. We combined two MT translations for Chinese-English, and two for English-Korean. For the latter, a webbased entity-orient...
متن کاملNTCIR-5 Chinese, English, Korean Cross Language Retrieval Experiments using PIRCS
In NTCIR-5 our focus is to see if web-assisted query expansion is useful, and to test an EnglishKorean bilingual dictionary. We participated in Chinese, Japanese, Korean and English monolingual retrieval using also web expansion for Chinese and English. We also performed Chinese-English, English-Chinese, English-Korean bilingual, and Chinese-Korean pivot bilingual CLIR. The query translation ap...
متن کامل